Summary

Our group chose to analyze data from the Boston Marathon in 2015-2017. Most of our work focuses on 2017, but we conducted analysis across all three years of data. The total size of the dataset is about 26,000 rows with 25 fields for each year. Fields include demographic information (age, gender, hometown and/or country) for each runner, the finishing time for each runner, and periodic times runners reached certain milestones along the course.

For a more detailed overview of our data, please see section III.

Overarching SMART Question

Our group was interested in determining what makes a runner fast based on the data we have? How much role, if any, does gender, age, past runnings of Boston, nationality, or home state determine a runners finial finishing time? To answer this, our team broke this question into the following component pieces.

Data Dictionary

Looks like there are several fields we do not need or are blanck throughout the dataset. Let’s see what each column is doing:

Note on Bib Numbers Bib numbers are given out in ranges based on the runner’s fastest qualifying time. Below is the info for 2017 (http://registration.baa.org/2017/cf/Public/iframe_EntryLists.cfm):

Bib numbers are color coded. Red bibs (numbers 101 to 7,700) are assigned to Wave 1 (10:00 a.m.). White bibs (numbers 8,000 to 15,600) are assigned to Wave 2 (10:25 a.m.). Blue bibs (numbers 16,000 to 23,600) are assigned to Wave 3 (10:50 a.m.) Yellow bibs (numbers 24,000 to 32,500) are assigned to Wave 4 (11:15 a.m.). The break between Wave 1 (10:00 a.m. start time) and Wave 2 (10:25 a.m. start time) is 3:10:43. The break between Wave 2 and Wave 3 (10:50 a.m. start time) is 3:29:27. The break between Wave 3 and Wave 4 (11:15 a.m. start time) is 3:57:18.

Analysis of Gender and Age

SMART Questions

  • Do men and women perform differently? Are they different in age? Are these differences statistically significant?

  • Is there relationship between performace at half time (minutes it take to get to half) and official time (time it take to finish the race)?

  • Does your performace at each quarter of the race (rank at each quarter, only looking at time it take to run that quarter) impact your final rank?

Do men and women perform differently? Are their age different?

  • There are 11972 female and 14438 male participants in the data frame.

  • Women’s age (x=39.9) is siginificantly different from men’s age (x=44.8) (p<0.001).

  • Women’s average oficial time (x=249) is siginificantly different from men’s average official time (x=229) (p<0.001).

  • Women’s average half time (x=117) is siginificantly different from men’s average half time (x=105), p<0.001).

Can half time predict official time?

  • In a model exploring half time’s relationship with official time, slope/intercept value is -3.04 and is significant (p<.001). One minute increase in Half Time positively impacts (+2.18) Official Time (p<0.001). Adjusted R squared is high at 0.894.

Can age predict official time?

  • In a model exploring the participants age in relationship with official time, slope/intercept value is 202.66 and is significant (p<.001). One year increase in age increases 0.8313 minute Official Time (p<0.001). Adjusted R squared is low at 0.0507.

Do ranks during each quarter of the race predict overall rank? Which quarter is the best predictor for the overall rank?

  • In a model exploring relationship between Rank at various intervals of race (0-10K, 10-20K, 20-30K, 30-42.195K) and Overall Rank, slope/intercept value is -.0686 and is significant (p<.001).
  • One rank increase during the first quarter of the race (0-10K) impacts (+0.186) overall rank (p<0.001). One rank increase during the second quarter of the race (10-20K) increases (+0.175) overall rank (p<0.001). One rank increase during the third quarter of the race (20-30K) increases (+0.291) overall rank (p<0.001). And the final quarter seems to have the highest impact on the overall rank; one rank increase during the last quarter of the race (30-40K) increases (+0.400) overall rank (p<0.001).
  • Adjusted R squared is high at 0.99. There is concerning amount of collinearity with rank between 10-20K (vif=23.77) and rank between 20-30K (vif=17.98).

Additional Analysis of Age

SMART Questions

Are the average finishing times across different age groups differ ?

Are there any trends across different age groups ?

Is Age really a factor in deciding the finishing time of the marathon (Well, this boston marathon) ?

Descriptive Statistics :

Let us add a column to the existing dataframe which tells us about the agegroup that particular runner belongs to. We have divided the age groups according to the USATF(USA Track & Field) standard i.e 5 years and also keeping in mind the number of runners in each age group. In 2017 boston marathon, min. age of the participants is 18 and max. age of the participants is 84.Usually, Marathons and long distance events often use 19 and under as the youngest age group. Following this, we have divided the age groups as 18-24,25-29,30-34,35-39,40-44,45-49,50-54,55-59,60+.

Lets visualize the data. First, we look at the total number of runners for each age group.

## Warning: Ignoring unknown parameters: binwidth, bins, pad

As we divided the data according to the no. of runners in each age group as well, there are more than 1000 runners in each age group. Now, lets look at the boxplot of Official run time by age group so that we will get to know the trends across age groups.

By looking at the boxplot, we can observe that there isn’t much difference in the average running time of age groups from 18-24 till 45-49. Ofcourse the average running times differ from age of 50. And also, distribution of some of the age groups looks same except for few outliers.Boxplot suggests that runners in agegroup 30-34 are faster compared to other agegroups and runners in age group 60+ are the slowest. Now, lets subset the data according to agegroups before we go on to statistical tests.

Test for Normality

Below are the QQ plots for all these subsets to check if they are normal.

Lets sample 50 observations from each subset of age groups and bind them together to a new data frame.

ANOVA

Lets perform ANOVA test to compare the means of the run times across different age groups. Null Hypothesis :- There is no significant difference between the means of run times across different age groups i.e in other words, Age has no significant impact in finishing times of runners.

##              Df Sum Sq Mean Sq F value   Pr(>F)    
## agegroup      8  70939    8867   5.445 1.46e-06 ***
## Residuals   441 718133    1628                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 1.959398

According to the results of the ANOVA test, pvalue is which is less than the significant level 0.05. We can formally reject the null hypothesis that there is no difference between the means. Lets take a look at the tukey comparison table to check the mean comparison between different age groups.

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Official.Time.Min ~ agegroup, data = sample_ag)
## 
## $agegroup
##                   diff         lwr       upr     p adj
## 25-29-18-24 -10.769000 -35.9276796 14.389680 0.9205640
## 30-34-18-24 -17.109333 -42.2680129  8.049346 0.4609946
## 35-39-18-24  -4.091000 -29.2496796 21.067680 0.9998883
## 40-44-18-24 -13.894667 -39.0533463 11.264013 0.7329886
## 45-49-18-24 -13.312667 -38.4713463 11.846013 0.7766550
## 50-54-18-24  -2.128000 -27.2866796 23.030680 0.9999993
## 55-59-18-24  11.556000 -13.6026796 36.714680 0.8850655
## 60+-18-24    23.675333  -1.4833463 48.834013 0.0836800
## 30-34-25-29  -6.340333 -31.4990129 18.818346 0.9972185
## 35-39-25-29   6.678000 -18.4806796 31.836680 0.9960066
## 40-44-25-29  -3.125667 -28.2843463 22.033013 0.9999858
## 45-49-25-29  -2.543667 -27.7023463 22.615013 0.9999972
## 50-54-25-29   8.641000 -16.5176796 33.799680 0.9780804
## 55-59-25-29  22.325000  -2.8336796 47.483680 0.1285818
## 60+-25-29    34.444333   9.2856537 59.603013 0.0008069
## 35-39-30-34  13.018333 -12.1403463 38.177013 0.7974242
## 40-44-30-34   3.214667 -21.9440129 28.373346 0.9999824
## 45-49-30-34   3.796667 -21.3620129 28.955346 0.9999367
## 50-54-30-34  14.981333 -10.1773463 40.140013 0.6442827
## 55-59-30-34  28.665333   3.5066537 53.824013 0.0125198
## 60+-30-34    40.784667  15.6259871 65.943346 0.0000224
## 40-44-35-39  -9.803667 -34.9623463 15.355013 0.9530719
## 45-49-35-39  -9.221667 -34.3803463 15.937013 0.9673233
## 50-54-35-39   1.963000 -23.1956796 27.121680 0.9999996
## 55-59-35-39  15.647000  -9.5116796 40.805680 0.5871250
## 60+-35-39    27.766333   2.6076537 52.925013 0.0182424
## 45-49-40-44   0.582000 -24.5766796 25.740680 1.0000000
## 50-54-40-44  11.766667 -13.3920129 36.925346 0.8741492
## 55-59-40-44  25.450667   0.2919871 50.609346 0.0449487
## 60+-40-44    37.570000  12.4113204 62.728680 0.0001479
## 50-54-45-49  11.184667 -13.9740129 36.343346 0.9028519
## 55-59-45-49  24.868667  -0.2900129 50.027346 0.0554874
## 60+-45-49    36.988000  11.8293204 62.146680 0.0002051
## 55-59-50-54  13.684000 -11.4746796 38.842680 0.7491657
## 60+-50-54    25.803333   0.6446537 50.962013 0.0394364
## 60+-55-59    12.119333 -13.0393463 37.278013 0.8545510

Here, if we take a look at the p values of the age groups comparisons involving agegroups 55-59, 60+ the p-values are less than the significant level and all other age groups have high p values. So, lets omit the age groups 55-59 and 60+ and then see the trend between the remaining age groups. Lets perform anova again for the other age groups.

ANOVA

Null Hypothesis :- There is no significant difference between the means of run times across different age groups i.e in other words, Age has no significant impact in finishing times of runners.

##              Df Sum Sq Mean Sq F value Pr(>F)
## agegroup      6  13168    2195   1.308  0.253
## Residuals   343 575685    1678
## [1] 2.125037

After performing the test, we got a p value of by which we can accept the null hypothesis that there is no difference between the means.

In conclusion, in boston marathon, if the runner’s age is below 55, Age is really not a factor in deciding the finish run time of the marathon.

Analysis of Past Races

SMART Questions Which age group is performing better over the years.

Any trends in the Average official time for top four countries with highest number of runners over the three years.

Let’s check whether the means of official time for the year’s 2015, 2016 and 2017 is same.

let’s check whether the performance of the third time runners is better than the first time runners by comparing their average official time in the year 2017.

From the summary of three year’s data, we can observe that the average age of the participants for all the three year’s is equal.

Now let divide the age into groups and check which age group is performing better over the years.

From the above three plots, we can observe that 30-34 Age group has the lowest average official time over the years. So, we can conclude that 30-34 Age group people is performing well compared to other Age groups in all the years.

Let us see if there is any trend in the Average official time for top four countries with highest number of runners over the three years.

## 56 codes from your data successfully matched countries in the map
## 35 codes from your data failed to match with a country code in the map
## 187 codes from the map weren't represented in your data

## 53 codes from your data successfully matched countries in the map
## 26 codes from your data failed to match with a country code in the map
## 190 codes from the map weren't represented in your data
## You asked for 7 categories, 5 were used due to pretty() classification

## 51 codes from your data successfully matched countries in the map
## 28 codes from your data failed to match with a country code in the map
## 192 codes from the map weren't represented in your data
## You asked for 7 categories, 5 were used due to pretty() classification

Taking the top four countries with highest number of participants. USA, Canada, UK and Mexico has the highest number of runners in all the three years.

As we can see from the plot, there is very slight increase in average official time for top four countries with highest number of runners over the three years.

Now lets look at the ANOVA test comparing official time for all the three years.

Null hypothesis: There is no difference in means of the official time of runners for three years.

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Official.Time.Min ~ year, data = official_time_sample)
## 
## $year
##                 diff       lwr      upr     p adj
## 2016-2015  3.6366667 -15.18589 22.45922 0.8911584
## 2017-2015  3.2500000 -15.57256 22.07256 0.9120556
## 2017-2016 -0.3866667 -19.20922 18.43589 0.9986966

p value 0.8814855 is greater than 0.05. Also from tuckey’s results, we can see the probability value is greater than 0.05 for all comparisons. So, we fail to reject the null hypothesis and we can conclude that the means of the official time is same for three years.

Now let’s see whether there are any runners participated in all the three years.

There are total 5458 runners participated in both 2015 and 2016 marathon and there are 2268 runners participated in all the three years.

Now let’s check whether the performance of the runners participated in all the three years is better than the first time runners by comparing the average official time in the year 2017.

Null Hypothesis for z-Test: No difference in average official times of first time and third time runners.

Alternate Hypothesis: Average official time of first time runners is greater than Average Official time of runners participated in all the three years. alpha:0.05

Since the p-value less than alpha 0.05, we reject the null hypothesis.That means the average official time of third time runners is lesser than first time runners. we can conclude that third time runners performance is better than the first time runners.

Analysis of Nationality/Continent

###Smart Questions I foucs on explore whether nationality affect marathon time, and specific questions include:

(1)Is marathon time same in different nationality/continents? If not, how is the difference?

(2)Among ‘fast runners’ groups (top 1/4), is marathon time still the same/different? How does the difference look?

(3)Would nationality/continent be independent with finish time?

###Expextation and Data Exploratory Before data exploratory we expect African group would be the fast and avearge finish time could be different. As for nationality/continent factor, we expect it not to be independent.

First we will a look at distribution of finish time among different countries

From finish time-nationaliy plot we can see, finish time is different among different country groups. But since there are more than 90 countries included, it is hard to read and get further exploration. So we decided to group countries by continents. Since American runners are most, we list USA as an individual part in Continent.

We download library countrycode, which helps us convert country code to continents. After checking, we found some country codes are not included in library countrycode, so we reviewed remained country codes and fixed. Now we got a new column called Continent. Six continents in total include Asia, Americas, Europe, USA, Africa and Oceania.

By continent, 20945 runners are from the U.S.,2944 are from Americas, 1614 runners are from Europe, 681 are from Asia, 194 are from Oceania, and 32 are from Africa.

Then we explore on Finish Time-Continent plot. Boxplot shows average finish time in Africa group is much lower than others. Besides Africa group, other groups all have outlier. We expect Africa group’s average time is lower and is differnt from others, then we do anova test to test whether out expectation is correct.

Before Anova test, first we explore whether each group is normal distibution.Besides Africa, other groups are all normal distribution. But because Africa group does not bias too much, we can continue Anova.

Anova test shows p value is very low, so we reject the null hypothesis that mean of official finished time are same, to find which groups are different, we go into hoc test.

Under 95% confidence level, we fail to reject the null hypothesis that two gouups of runners – who are from Oceania and Americas, Oceania and Europe have the same finshing time. And we reject the null hypothesis that other group’s finishing time are different. Therefore, runners from Oceania & American and Oceania & Europe have same finish time, but other continents group are different.

Then we remove outlier to see any changes.

We remove individuals who finish Marothon extremely slow.

After removing outlier, we test normal distribution and do anova test again.

The result shows after removing outlier p value is still extremely low, so we reject null hypothesis that finish time are same. Hoc test shows the same results as before that besides Oceania & American and Oceania & Europe, other groups are different in finish time.

## Outliers identified: 169 nPropotion (%) of outliers: 0.7 nMean of the outliers: 335.47 nMean without removing outliers: 235.58 nMean if we remove outliers: 234.93 nOutliers successfully removed n

Next we explore whether resluts are same in those fast runners. We subset and pick runners with shorest 1/4 finish time. The runners who finished race less that 207 mintutes is at top1/4.

Boxplot shows African runners still run faster than other five groups.

We do normal distritution test as before. Now all groups obey normal distribution. Anova test shows we reject null hypothesis that all groups have same means. Hoc test shows African has different finish time compared with any other group, and the remaning groups have the same finish time

Last but not least, we explore whether continent and finish time are independent in fast group. We did Chi-Test, and result show p value is small, so we reject the null hypothesis that two are independent. Therefore, continent has effect on Finish time.

## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  conttable
## X-squared = 15051, df = NA, p-value = 0.0004998

Analysis of Home State

SMART Questions

I decided to focus how much of an impact the state you live in affects the average marathon time. Several specific questions are involved with that broader question:

How do runners from Massaschusets, the home state of the marathon, compare to the rest of the runners?

Are the top three states’ average finishing times similar?

Are there states that are particularly faster or slower than others?

Are there any trends across different regions?

We’ll explore elements of all of these questions in the data analysis below.

Data Ingest and Cleaning

First, lets read in the data for 2017. I’m using the standard R ‘read.csv’ rather than the readr ‘read_csv’ since that package defaults all time to a specific time of day.

We have a dataframe with 26,410 total entries and 25 fields. The data dictionary for all of these fields is included in the data summary and introduction.

Lets make a few changes to the “Official.Time” field, which repersents the finishing time for each runner. This will allow us to display the data as a new field in minutes called “Official.Time.Min”. We’ll use the lubridate package to convert a charachter time in ‘Hours:Minutes:Seconds’ format to number of seconds, which we will divide by 60 to give us a total run time in minutes. This will let us compare running times more easily.

Basic Descriptive Statistics

For this section of analysis, we’ll first explore the data and see how consistent the average race times are across the states. We’ll also look at how many runners come from each state and ensure there are enough runners from each state to be significant. Then, I’ll look at the states with the three most runners, and run an ANOVA comparison on the means of their times. To do this, I only need the fields ‘Official.Time.Min’ and ‘State’, so I’ll subset down to save some memory since there are over 26,000 observations. At first, I will include the ‘Country’ field so I can subset down to just the United States.

From running the summary function on our data, we learn some interesting information about where our runners come from. Nearly 21,000 of our 26,000 runners come from the United States, with the second-best repersented country being Canada. Additionally from looking at the states, Massachusets has over 4500 runners, more than double the next state. There are also almost 4000 blank “states”, which makes sense for runners from countries with administrative regions other than states.

When we look at the descriptive statistics from the official running time in minutes, we see evidence of a definite right skew. The median is 231.66, but the mean is 238.06, suggesting a significant right tail. The standard deviation is 42.15.

Now lets examine just runners from the United States, the focus for this analysis. Looks like we have 20,945 runners from the United States. Unsuprisingly, we are left with very similar summary statistics because US runners were such a large portion of the runners. The median is 233.12, but the mean is 239.7, again suggesting a significant right tail. The standard deviation is 42.65.

We are left with 57 total states in the dataframe. In addition to the 50 US states, there are also runners from:

AA, AE, and AP: These are overseas postal codes for Americans in Asia, Europe, and in the Pacific and are usually used by members of the US military.

DC: District of Columbia

GU: Guam

PR: Puerto Rico

VI: US Virgin Islands

Initial Visualization and More Statistics

Now lets look at some basic visualizations. We’ll want to first look at the total number of runners for each state, and the average race time for each state. Since the I will be closely examining two variables (count of runner per state and average finishing time per state) across the same data, I will always use yellow/golds when color repersents counts of runners and blues when color repersents the average finishing times. As these are the colors of the Boston Marathon, I thought them appropriate for the analysis.

So we can see that most states have less than 500 participants, and only a few have more than 1000. Most states seem to have an average race time between 220 and 240 minutes, with a few outliers. But we don’t know how many runners were in those states that are outliers. It could be there were just a a couple of runners who ran very fast or slow. We’ll have to look at these possibilities later on in our analysis.

Let’s merge our Total Runners count and Average Time into one DF, and then merge this data into our ‘US’ dataframe with all of the observations. This will make it much easier to plot different visualizations by allowing us to set threshold and only plot states where the count of numbers is above a certain threshold.

We can now make a boxplot for all of the states that with more than a certain number runners in the 2017 race. After trying several thresholds, I decided 60 runners was a good threshold for analysis. It included most states but removed several states with a small number of runners and a significantly different average race time.

I’ve also shaded the boxplots by the number of runners so that darker golds have more participants than boxblots with a lighter yellow. MA definitely has the most particpants, and a much slower time. Additonally, the IQR for some other states near MA like RI and NH are definitely slower than other states. Also, runners from Colorado seem to be FAST, but it looks like there arent that many particpants.

Now that we have this date prepared we can make a bar chart to better review the data. We’ll plot the height of each bar as a function of count, and color by the average time of the runner. We’ll also make two copies of this chart. The first will be sorted by total number of runners per state, and the second by average completion time. We’ll also only plot states that had more than 60 runners to prevent one or two “fast” runners from a state being over-weighted.

Now we are seeing some trends for different states. Four of the slowest ten states (MA, RI, NH, ME) are from New England. It looks like runners from New England and MA specifically are much slower. Also, CO looks like they are pretty fast and its easy to see here that they had over 500 runners.

Before we go on to some statistical tests to see if we can prove that New England is slower, let’s make a map to get some more info.

Mapping and Further Analysis

Let’s make a plot of average finishing time across all of the states. Since a few finishing times in states with small numbers of runners can dramatically skew the average, lets keep only looking at states where there are more than 60 runners. States with less than 60 runners will be greyed out.

A plot with our other variable of interest, total number of runners by state, was not very interesting because MA has so many more runners than any state. Even the #2 state, California, has so almost double the runners as the #3 state, New York. Therefore, I kept plotting focused on the average finishing time and excluded the map with counts by total runners.

I also diverged from my yellow/blue color scheme and chose to use the ‘virdis’ color pallete to better identify the contrasts between states.

Looks great! We can see that New England does look a little slower, so let’s do another map one with just New England.

For states in New England, we definitely see slower times. In fact, the mean finishing time for New Englanders is 262.5597352 minutes compared to a finishing time of 262.5597352 minutes for Non-New Englanders.

Statistical Analysis

So the top states are MA, CA, and NY. Let’s first focus on these three states and see if the average finishing times are different for the top three states. Lets start by making a boxplot for each of these three states.

The boxplot really shows that MA runners are slower than those from CA and NY. The histogram for MA is also much less skewed to the right than CA or NY. This suggests there’s something different about runners from MA, aside from there being so many more runners from MA.

Testing for Normality

We can create QQ plots for all three of these datasets to see if they are normal. After reviewing all three, the MA dataset is the most normal of the three, with CA and NY having significant right-skews. It makes sense to sample all three of these datasets for our ANOVA testing. Let’s sample 50 observations and bind them together to a new dataframe.

ANOVA Test

Now lets look at the ANOVA test comparing these three means of the top three states. The null hypothesis here will be there is no difference in means between the three states. The alternative hypothesis is that there is a difference. If the p-value is significantly small (below .05 for a 95% confidence level), we will reject the null hypothesis and conclude there is a difference between at least some of the means.

After completing the test, we find that there is an incredibly small p-value of 1.750529910^{-6}, suggesting we reject the null hypothesis that there is no difference between the means. There is something different about these means. Lets take a look at Tukey’s multiple comparison of means.

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Official.Time.Min ~ State, data = samples)
## 
## $State
##            diff       lwr       upr     p adj
## MA-CA  32.68267  13.80031  51.56502 0.0002017
## NY-CA  -7.83900 -26.72136  11.04336 0.5887870
## NY-MA -40.52167 -59.40402 -21.63931 0.0000033

When we look at NY to CA, Tukey’s family-wise comparison shows an incredibly high adjusted p-value. Let’s explore that more.

Two-Sample T-Test

Let’s conduct a two-sample T-test comparing these two similar means for CA and NY. The null hypothesis for this test will be that CA and NY have the same mean finishing time. The alternative hypothesis is that NY and CA have different average finishing times.

The result of the T-Test is 0.25002, a very high p-value. Thus, we fail to reject the null which stated there was no difference in the means between CA and NY.

Analysis of average finishing times across the US without New England

If we filter out states in New England, how different are the average finishing times for the rest of the country? To answer this question, we have to first do some data cleaning. We can limit our analysis to the 35 non-New Englanf states that had more than 60 runners. Then, we’ll take 50 random samples from each one of those states, and finally conduct an ANOVA analysis on that sample data grouped by states. The null hypothesis is that there is no difference amongst means from non-New England states with at least 60 runners. The alternative is that there is a difference.

After running the analysis, we get a high p-vale of 0.16196, meaning we fail to reject the null hypothesis. It looks like there is a high level of consistency for the average run time across the US outside of New England. When I ran this analysis before on states with fewer total runners at the race, my answers were far more variable. This is further evidence that we need to make sure we filter out states with only a handful of runners as a few fast or slow finishing times can affect the entire state’s average.

Analysis of the rest of the US vs. New England

The final piece of analysis we’ll do is a two-sample T-test between New England states and non-New England states to compare average finishing times. The null hypothesis is that there is no difference in mean finishing times for states in New England compared to the rest of the US, while the alternative hypothesis is there will be a difference. Based on our previous analysis, we expect there to be a difference, but we should run the test to be sure.

The test returns a very low p-value of 7.760726810^{-6}, telling us we should reject the null hypothesis. Average race times from New England are different from averages around the rest of the country.